Introduction

FigName

Chaperones increase the rate of correct folding by binding newly synthesized polypeptides beforethey are completely folded. They prevent the formation of incorrectly folded intermediates thatmay trap the polypeptide in an aberrant form . Chaperones work by binding to exposed hydrophobic patches on misfolded or incompletely folded proteins and hydrolyzing ATP. Hsp70 acts early on in the process of protein folding, binding to polypeptide chains emerging from ribosomes where there is a chain of 7 hydrophobic amino acids. The motif that is often recognized by BiP chaperone, a Hsp70 chaperone localized to the lumen of the endoplasmic reticulum, takes the form: Hy(W/X)HyXHyXHy

Hy-large hydrophobic amino acid [ Trp,Leu,Phe] W-Trp X-any amino acid

developments as therapeutics: Ligand binds to target protein but is also attached to a small peptide that is recognized by ubiquiton ligase which will be targeted towards a proteosome for degradation–> PROTACS [PROteolysis TArgeting Chimeras]

AIM: steer a protein towards a degradation pathway.

AGGREGATED PROTEINS: More specifically, we are trying to design peptides that can redirect small oligomers that are responsible for build up of proteins and formation of aggregates.

we are now looking for peptides that are able to attract chaperone proteins and more specifially the hsp70 and potentially the hsc70 becausec hsc is invoved in chaperone mediated autophagy .[Aggragates are usually clumpy and large and it would be difficult to target these aggreagtes through normal protein degradation pathways]

4po2SBD

Future work: Lc3 adapter protein for autophagy of aggregates

Looking towards computational techniques to explore the potential peptide space.

Alanine Mutagenesis

Starts with mutating all position of the substrate ‘NRLLLTG’ to alanines: ‘AAAAAAA’–> FOLDX removes the 1st index so: ‘AAAAAA’ At each position, mutate to every other Amino Acid

Hill Climbing

Entirely deterministic

Trial 1

Starts with alanine substrate and mutates each index to all other amino acids. Those that are above a particular threshold(the minimum–poor choice on my part) where mutated in their second position to every other amino acid…

trial1HC
PROBLEMS:
  • local minima is guarenteed and not the global optimum
  • exponential number of pdbs need to be explored as the number of postions are mutated

Trial 2

Starting from the binding pocket (index 432567), mutate each position to all other amino acids. Those that were above the mean of that position are considered for the next round of mutations. Results show charged amino acids flanking a hydrophobic center!

trial2HC

Genetic Algorithm

An optimization process where the aim is to improve the ability for individuals to survive( on some metric).

Representation of the Solutions: hexamer(6aa) peptides An evolutionary algorithm utilizes a population of individuals that represent a set of possible solutions.

Fitness Function: map a representation of the solution to a scalar value. In my implementation an implicit fitness remapping is use: FoldX analysecomplex Interaction Energy. The evaluation of the fitness function represents how close the solution is to the optimal solution(more later)

  • crossoverprobability \(pc=0.8\)
  • mutation Rate \(p_m=0.3\)
  • Elitism rate=0.6
  • nvar=6
  • maxFitness=-12
  • population size=100

Initial population: 100 hexamer peptides. The goal is to ensure that the initial population is a uniform representation of the entire search space. Improvements: if prior knowledge of the search space is known, we can use it to bias the initial population Limitations: early convergence and not all search space will be explored

Sample Size: affects accuracy and convergence. Small size:Time complexity per generation is lower but may need more generations to converge. Improvements: force smaller populations to explore larger search space by increasing rate of mutation(explained later) Large size: larger area of search space but time complexity increases

Operational functions

Roulette wheel Selection: The fitness of the peptide influences the probability of it being selected. The chance for an individual being chosen is proportional to the fitness

\[p_i=\frac{f_i}{\sum_{i=1}^{N} f_i }\] random number generated [0,1] \(\zeta \le p_i\) –> selected

Elitism: selection of individuals from the current generation to survive to the next generation. The number of individuals that survive to the next generation without mutation is referred to as the generation gap. If gap=0: new generation is entirely new individuals Best k(0.6) individuals survive to the next generation→ to ensure that the max fitness does not decrease

Two point Cross-over : aim is to produce offspring from two parents by the selection operator. It is not necessary that each group of parents produces offspring. Crossover occurs ar a certain probaility \(p_c=0.8\) such that if \(\zeta \in [0,1] \le p_c\) then there is a crossover event. In my implementation, each crossover event results in 2 offspring. Two positions are randomly selected and the substrings between these points are swapped.
crossover2pt

Random Mutation : the aim of mutation is to introduce new genetic material into the existing indivdual –> add diversity. Mutation and crossing over are operational functions that all exploration of a large range if existing solutuons. Mutations occur at the rate \(p_m =0.3\). Usually, small values are used as the muation rate to ensure that the resulting mutated children are not distorted too much(more later). if \(\zeta \in [0,1] \le p_m\)–> mutate

random_mutation

Results 10 generations

y-axis:Interaction_Energy x-axis:# Pdb / peptides

  • Methionines show up in most of the potential peptides. We can check sequence logos to look at sequence composition.

y-axis: Interaction Energy x-axis: Generations

Adding Methionine +1

HSP70

create a dataframe with all the peptides: remember there will be duplicate seq

peptides freq_peptides interactionEnergy
RWYIPR 2 -18.18153
RWLMPL 1 -18.08830
VWYIPL 2 -18.01177
VWLMPL 1 -17.67277
RWFIEY 2 -17.51690
EWYIEF 1 -17.40450
RWYLHP 1 -17.33623
FWYLMP 1 -17.29543
EWFIEF 1 -17.27043
EWFIDF 2 -17.00780
MWYLMP 3 -16.96713
EWFMEF 1 -16.96180
RWFKEY 1 -16.79350
VWDLRY 2 -16.67823
VWHMPL 1 -16.63530
FWFLMP 1 -16.60080
EWFIGF 1 -16.29400
VWYLEA 1 -16.21840
VWFIEL 1 -16.19760
RALMKY 1 -16.16453
KCYLRY 1 -16.11610
EWDLRF 1 -15.98983
VWDMPL 1 -15.93440
VWDLRP 1 -15.92400
VWEIMP 1 -15.77400
RILMRR 1 -15.76753
VWYICA 1 -15.73077
VWLMHP 1 -15.71137
RILICL 1 -15.67273
NWFICL 1 -15.67257
LKHMPP 1 -15.65437
LFHMPL 1 -15.57600
VWDLMY 3 -15.57410
NWFIEL 1 -15.49953
RWDLMY 1 -15.47767
RCFIEY 1 -15.42807
VLKIMP 2 -15.38503
VWFMEK 1 -15.37583
RAYMPR 1 -15.37110
VWYLCA 1 -15.36723
VWFKEP 1 -15.36190
MILIMP 1 -15.29850
VAYMPL 1 -15.28110
RAFMPR 1 -15.22777
VWFMCA 1 -15.22103
VWFMGL 1 -15.19027
AFYIDF 1 -15.15043
RILLHR 1 -15.07613
FWYLET 1 -15.07257
VMHIPL 6 -15.05593
MILMMP 1 -14.97693
RILMCL 1 -14.97257
RCYIGY 1 -14.88833
NWFMGL 1 -14.86623
VMHMPL 1 -14.86353
RILMHR 2 -14.81973
VWDLMP 1 -14.80067
VMHMFY 2 -14.75083
TMHMPL 2 -14.72100
RCFKPL 1 -14.65093
VWDMMP 2 -14.64550
NWYLEC 1 -14.61237
VILMCL 2 -14.58650
VMHMFP 2 -14.15240
VWDLEA 1 -14.12530
KFDLHY 1 -13.97017
VWDLEC 1 -13.46173
RCDLRY 6 -13.38167
NMHICL 1 -13.16787
CRLMDT 4 -13.05393
VMDMPL 1 -12.57067

DNAK

peptides freq_peptides interactionEnergy
DPRLFPW 1 -16.50707
QFMMFPY 2 -16.21813
QLMMFPD 1 -16.06513
ILMMFPD 1 -16.00513
VMPLFPR 1 -15.74783
QFLLPIY 1 -15.71213
NPLLPII 1 -15.58663
LMPLFIR 1 -15.53827
NPMMFPI 1 -15.53333
VMPLFIR 1 -15.46510
DPFLFPW 1 -15.38390
LFLIIFV 5 -15.22323
DFRMTPW 1 -15.06197
VDPLFPR 1 -15.02960
EIPLFPA 1 -14.98060
NLRLWIY 1 -14.96173
VLMLFTR 2 -14.93977
DIPLFPW 1 -14.87653
APFIFIV 1 -14.75227
KVRLWIY 5 -14.70183
QAMMFPD 1 -14.65450
QKLVFPD 1 -14.61573
LKPLFIV 1 -14.58803
QPRLWID 3 -14.53497
QLMMIYD 1 -14.47693
WVSMFPR 1 -14.43687
AMPLFID 1 -14.40473
KARLWIY 1 -14.40053
EWFMNYI 1 -14.32020
LGRIIFV 1 -14.30933
QWGMPIY 1 -14.27170
WDPLFPA 1 -14.26513
WEILWIY 1 -14.25403
QFLKPIY 1 -14.24707
VAPLFIR 2 -14.20760
KMRKILY 1 -14.17833
APFMWPV 1 -14.12367
QLRVIFD 1 -14.08600
CERLWIH 1 -14.05417
VDPLFIR 1 -14.03677
KVRMNLH 1 -14.03007
KLRKWIY 1 -14.02397
LKLVIFV 1 -14.00563
KPFMSYI 1 -13.95157
QLFMSYD 1 -13.87347
DARLTPW 1 -13.83487
KARLSYI 1 -13.81090
LRLVIFV 1 -13.80523
WDPKFPA 1 -13.78527
ILRVIFD 2 -13.75590
DLRKTPW 1 -13.74073
NARLWIY 1 -13.64490
WPIMNLH 2 -13.59007
DAPLFPW 1 -13.58643
WEIMNLH 4 -13.57550
CSMMIFQ 1 -13.57027
NARLWLY 1 -13.55043
DFRKTPW 1 -13.43020
QEMLWID 4 -13.39407
PFLKYIK 1 -13.37763
KAFLSYI 3 -13.13740
KEIMNYI 2 -13.09220
KARKWIY 1 -12.49083
QARVIFD 2 -12.31050
IARVIFD 1 -12.29057
WDPMSYI 1 -12.26897
CSMKIFQ 1 -12.24807
NAMMSYI 2 -12.19043
CSFMSFQ 1 -12.15817
PGRKYLK 1 -12.05280
PALKYLK 2 -11.97247
WARLSLA 1 -11.74531
PGRVIFD 1 -11.50383

HSC70

peptides freq_peptides interactionEnergy
YFLMLP 1 -16.29440
FMFMPL 2 -16.04083
MILMHF 1 -15.94077
YWHMLL 1 -15.72873
FFMMLL 1 -15.70983
LKHMLF 1 -15.66133
FFFMML 1 -15.61387
SFLMLL 2 -15.32073
RLMMHL 1 -15.14853
QWYLRP 1 -15.13430
MWYMDL 1 -15.09993
SMFMLL 1 -15.02673
FFHMLL 1 -14.99177
FCFMQY 2 -14.98643
MIFMGF 1 -14.98173
VFLMGF 2 -14.97930
LWIIPL 1 -14.92187
FKYMLL 1 -14.89593
MIFMGY 2 -14.85770
RMFMGY 1 -14.85200
RCFMRY 1 -14.82190
FFHMML 1 -14.82007
LWIIMP 6 -14.81893
LKEIMF 1 -14.79217
LKEILF 3 -14.77560
LKHMLL 1 -14.72487
RLHMHL 1 -14.68263
DFHMPL 1 -14.67963
RFLMHP 1 -14.63247
YWEIMP 1 -14.61420
FSLMLL 1 -14.61133
YFLMHP 1 -14.56273
MKYLRP 1 -14.52503
SFHMPL 2 -14.49313
SFHMLP 1 -14.46367
MFHLRP 1 -14.45827
FCFMLL 3 -14.44907
SILMTY 1 -14.43263
MIPKLF 1 -14.42687
SFHMMP 1 -14.39320
VFFMMG 2 -14.38337
GWIIMY 4 -14.37040
LIHIMP 1 -14.34927
RILMAP 1 -14.34577
GWYMDL 1 -14.28900
DCFMPL 1 -14.21483
SFHMLL 1 -14.20200
MLMMHS 1 -14.17323
VKEIMF 1 -14.13463
LKYLRP 1 -14.09337
SWPLLL 4 -14.04427
LFHMLG 2 -14.02817
RFHMGY 1 -14.02390
RILMHP 1 -13.97610
LKEILP 1 -13.95940
MIPLHF 1 -13.92307
FMFMGL 1 -13.91030
LKEIMP 1 -13.85153
NWLMCT 2 -13.83833
EVHKLF 1 -13.79123
MILMCT 1 -13.74703
GKYLRY 1 -13.73467
NWPLTF 1 -13.70810
RCFMGY 1 -13.66923
RLMMDS 1 -13.60943
FCFMGY 1 -13.56093
FFHMGL 1 -13.12447
SCHMLL 1 -13.09980
RCFMCL 1 -13.02253
RCHMGY 1 -12.82293
SCFMGY 1 -12.78020
LKYLGP 2 -12.59557
VILMTS 1 -12.47337
DCEIPL 1 -12.30853
VFPMGS 1 -10.99050